Adds literal decoding variant with a per-stream LDS cache to coalesce memory writes through transposition. by pm4rtx · Pull Request #72 · microsoft/DirectStorage

pm4rtx · 2026-02-18T09:32:57Z

This PR makes literal decoding a bit more memory friendly and avoids scattered one byte per thread writes into N destination location, one per processed stream. Instead, it accumulates four decoded bytes with aligned destination addresses into a dword and then stores dwords from each processed stream into LDS. When it becomes full or the last full dword is formed, dwords from are flushed from LDS to memory cooperatively by the entire threadgroup making coalesced writes.

This new variant of the shader also reduce LDS usage to store Huffman table by a half (from 2048 to 1024 dwords). This is still not ideal (768 dwords), but better and allows to recuperate some LDS space to put per-stream data cache there.

… memory writes through transposition.

…e for consistency with other kernels.

coopp · 2026-02-18T16:41:52Z

Getting back LDS space is goodness all around. Looks great to me.

Adds literal decoding variant with a per-stream LDS cache to coalesce…

24d2697

… memory writes through transposition.

pm4rtx self-assigned this Feb 18, 2026

Updates fused kernel to use only 1024 dwords for Huffman table storag…

fafeb77

…e for consistency with other kernels.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds literal decoding variant with a per-stream LDS cache to coalesce memory writes through transposition.#72

Adds literal decoding variant with a per-stream LDS cache to coalesce memory writes through transposition.#72
pm4rtx wants to merge 2 commits intomicrosoft:developmentfrom
pm4rtx:unroll-huffman-decode

pm4rtx commented Feb 18, 2026 •

edited

Loading

Uh oh!

coopp commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

pm4rtx commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coopp commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

pm4rtx commented Feb 18, 2026 •

edited

Loading